Exploration and Exploitation in Parkinson’s Disease: Behavioral Analyses
Authors
Affiliations
Björn Meder
Health and Medical University, Potsdam, Germany
Martha Sterf
Medical School Berlin, Berlin, Germany
Charley M. Wu
University of Tübingen, Tübingen, Germany
Matthias Guggenmos
Health and Medical University, Potsdam, Germany
Published
July 25, 2025
Code
# Housekeeping: Load packages and helper functions# Housekeepingknitr::opts_chunk$set(echo =TRUE)knitr::opts_chunk$set(message =FALSE)knitr::opts_chunk$set(warning =FALSE)knitr::opts_chunk$set(fig.align='center')knitr::opts_chunk$set(prefer_html =TRUE)options(knitr.kable.NA ='')options(brms.backend ="cmdstanr") # more stability for M1 Macpackages <-c('gridExtra', 'BayesFactor', 'tidyverse', "RColorBrewer", "lme4", "sjPlot", "lsr", "brms", "kableExtra", "afex", "emmeans", "viridis", "ggpubr", "hms", "scales", "cowplot", "gtsummary", "webshot", "webshot2", "parameters", "bridgesampling", "cmdstanr")lapply(packages, require, character.only =TRUE)set.seed(0815)# file with various statistical functions, among other things it provides tests for Bayes Factors (BFs)source('statisticalTests.R')# Wrapper for brm models such that it saves the full model the first time it is run, otherwise it loads it from diskrun_model <-function(expr, modelName, path='brm', reuse =TRUE) { path <-paste0(path,'/', modelName, ".brm")if (reuse) { fit <-suppressWarnings(try(readRDS(path), silent =TRUE)) }if (is(fit, "try-error")) { fit <-eval(expr)saveRDS(fit, file = path) } fit}# Setting some plotting paramsw_box <-0.2# width of boxplot, also used for jittering points and lines line_jitter <- w_box /2xAnnotate <--0.3# jitter paramsjit_height <-0.01jit_width <-0.05jit_alpha <-0.6# colors groupcolors <-c("#7570b3", "#1b9e77", "#d95f02")choice3_colors <-c("#e7298a", "#66a61e", "#e6ab02")
We investigated how patients with Parkinson’s disease (PD) balance the explore-exploit trade-off using a spatially correlated bandit task, where the spatial structure of rewards facilitated value generalization (i.e., nearby options yield similar rewards). Participants were tested either shortly after taking their regular Levodopa (L-Dopa) dose (N=33) or just before their next scheduled dose (N=32). Patients with polyneuropathy served as a control group (N=34), comparable in age, depressive symptoms, and basic cognitive functioning. Behavioral and computational analyses revealed distinct patterns of exploration and exploitation. PD patients on L-Dopa balanced exploration and exploitation, though not as efficiently as polyneuropathy patients. In stark contrast, patients off L-Dopa rarely exploited known high-value options and primarily explored novel ones. This overreliance on exploration impaired their ability to navigate the explore-exploit trade-off and maximize rewards. To better understand the mechanisms underlying these behavioral differences, we employed a computational approach using the Gaussian Process Upper Confidence Bound (GP-UCB) model. This model integrates similarity-based generalization with two distinct exploration mechanisms: directed exploration, which seeks to reduce uncertainty about rewards, and random exploration, which introduces stochastic variability in choice behavior. The model parameters showed that behavioral differences between the on- and off-medication conditions were primarily driven by differences in uncertainty-directed exploration, while the level of random exploration remained unchanged. Both PD groups showed reduced generalization compared to controls, contributing to poorer overall performance. Our findings indicate that L-Dopa selectively modulates uncertainty-directed exploration, providing a more nuanced understanding of the central role of dopamine in the regulation of exploratory behavior.
We investigated how patients with Parkinson’s disease (PD) manage the explore-exploit trade-off using a spatially correlated multi-armed bandit task. Participants accumulated rewards by selecting tiles (options) with normally distributed rewards. The spatial correlation between rewards facilitated generalization, allowing participants to adapt to the structure of the environment and balance exploring new options versus exploiting known high-reward options.
Screenshot from experiment, procedure and example environments
3.1 Materials and procedure
40 distinct environments were generated using a radial basis function kernel with \(\lambda = 1\), creating a bivariate reward function on a grid that maps each tile location to a specific reward value. These reward functions gradually varied across the grid, creating environments with spatially-correlated rewards. The correlation between neighboring options is about r≈0.6, exponentially decreasing with distance.
Participants completed 10 rounds of the task, each featuring a new environment drawn without replacement from the set of 40 environments. In each round, participants had 25 choices to accumulate rewards. The first round served as a tutorial to familiarize participants with the task and was excluded from the analyses. The final round (round 10) was a bonus round where, after 15 choices, participants were asked to predict rewards for five unrevealed options. Data from this round were also excluded from the main analysis and analyzed separately.
At the start of each round, one tile was randomly revealed, and participants sequentially sampled 25 tiles. On each trial, they could choose to either click a new tile or re-click a previously selected tile. Selections were made by selecting the tile on the computer screen using a mouse, upon which they received a reward arbitrarily scaled to the range [0,50]. Re-clicked tiles showed small variations in reward due to normally distributed noise.
3.2 Sample
We collected data from adult participants with Parkinson’s disease (PD) who regularly receive Levodopa (L-Dopa) for symptomatic treatment (Abbott, 2010; Tambasco et al., 2018). Participants were recruited via a neurologist’s outpatient practice. Eligible participants were evaluated based on Hoehn-Yahr scores recorded in their patient files. The scale assesses disease severity and motor impairments based on a score from 1 to 5, with higher scores indicating greater severity (Goetz et al., 2004; Hoehn & Yahr, 1967). We limited recruitment to individuals with scores between 1 and 3, as scores of 4 and 5 reflect severe impairment.
PD patients were randomly assigned to two conditions: on medication (PD+) and off medication (PD-). In the PD+ group (N=33), patients’ scheduled L-Dopa was administered at least 30 minutes before the start of the experiment. In the PD- group (N=32), the next scheduled dose for participants was timed such that they were in a low dopamine state during the experiment, offering a clear contrast to the PD+ group. Thus, we refer to the ‘on medication’ condition as the state after taking L-Dopa and the ‘off medication’ condition as the state before their next scheduled dose.
The comparison group (N=34) consisted of individuals of similar age diagnosed with polyneuropathies (Control), a condition affecting the peripheral nervous system that can lead to physical symptoms such as pain, sensory loss, or motor weakness. However, unlike Parkinson’s disease, Control does not involve central dopaminergic dysfunction or cognitive impairment.
3.3 Clinical assessment
To characterize participants’ clinical status, we employed standardized measures assessing Parkinson’s disease severity, basic cognitive function, and depressive symptoms. PD severity was evaluated using the Hoehn-Yahr scale, which rates motor impairments such as postural instability and gait difficulties (Hoehn & Yahr, 1967). Participants can receive a score between one and five, with higher scores indicating more severe problems. Basic cognitive function of all participants was assessed through the Mini-Mental State Examination (MMSE), which is frequently used in in patients with dementia (Folstein et al., 1975). The test comprises 30 questions pertaining to different domains, including memory (e.g., recalling three objects), temporal and spatial orientation (e.g., date and location), and arithmetic ability. Finally, all participants answered the German version of the Beck Depression Inventory II, a self-report inventory consisting of 21 items measuring depressive symptoms (Beck et al., 1996; Hautzinger et al., 2006).
3.4 Sample characteristics
Note
exclude participant with pump
Table 1 shows the demographics of our sample, along with their Hoehn-Yahr, MMSE , and BDI scores. In the PD+ group, the mean time since their last L-Dopa dose was 104 min; in the PD- group it was 253min.
All behavioral data are stored in data_gridsearch_parkinson.csv, which contains the following variables (Table 2):
id: participant id
age is participant age in years
gender: (m)ale, (f)emale, (d)iverse
x and y are the sampled coordinates on the grid
chosen: are the x and y coordinates of the chosen tile
z is the reward obtained from the chosen tile, normalized to the range 0-1. Re-clicked tiles could show small variations in the observed color (i.e., underlying reward) due to normally distributed noise,\(\epsilon∼N(0,1)\).
z_scaled is the observed outcome (reward), scaled in each round to a randomly drawn maximum value in the range of 70% to 90% of the highest reward value
trial is the trial number (0-25), with 0 corresponding to the initially revealed random tile, i.e. trial 1 is the first choice
round is the round number (1 through 10), with 1=practice round (not analyzed) and 10=bonus round (analyzed only for bonus round judgments)
distance is the Manhattan distance between consecutive clicks. NA for trial 0, the initially revealed random tile
type_choice categorizes consecutive clicks as “repeat” (clicking the same tile as in the previous round), “near” (clicking a directly neighboring tile, i.e. distance=1), and “far” (clicking a tile with distance > 1). NA for trial 0, i.e., the initially revealed random tile.
previous_reward is the reward z obtained on the previous step. NA for trial 0, i.e., the initially revealed random tile.
last_ldopa: time of the last L-Dopa dose (HH:MM)
next_ldopa: scheduled time of the next L-Dopa dose (HH:MM)
time_exp: time of the experiment (HH:MM)
time_since_ldopa: time since last L-Dopa (in minutes)
We analyzed the behavioral data in terms of performance and exploration behavior. These analyses exclude the tutorial and bonus rounds, leaving a total of 200 search decisions (8 rounds \(\times\) 25 trials) for each participant. We then report the results of the bonus round, where we analyze participants’ reward predictions and confidence judgments. We report both frequentist statistics and Bayes factors (\(BF\)) to quantify the relative evidence of the data in favor of the alternative hypothesis (\(H_A\)) over the null hypothesis (\(H_0\)); see Appendix for details and references. Various helper functions are implemented in statisticalTests.R. Regression analyses were performed in a Bayesian framework with Stan, accessed via R-package brms, complemented by frequentist hierarchical regression analyses (via package lmer).
# mean reward per subject (practice and bonus round excluded)df_mean_reward_subject_by_round <- dat %>%filter(trial !=0& round %in%2:9) %>%# exclude first (randomly revealed) tile and practice round and bonus roundgroup_by(id, round) %>%summarise(age =mean(age),group =first(group),sum_reward =sum(z),mean_reward =mean(z), sd_reward =sd(z)) df_summary_by_round <- df_mean_reward_subject_by_round %>%group_by(round, group) %>%summarize(mean_of_means =mean(mean_reward, na.rm =TRUE), # Renaming to avoid confusionse_reward =sd(mean_reward, na.rm =TRUE) /sqrt(n()), # Standard error.groups ='drop' )aov_rounds <-aov_ez(id ="id", dv ="mean_reward", within ="round", between ="group", data = df_mean_reward_subject_by_round)# kable(as.data.frame(aov_rounds$anova_table), # format = "html", escape = FALSE, digits = 2, # caption = "ANOVA results with round as within-subjects factor and group as between subjects factor, where rewards per round were first aggregated within subjects.") %>%# kable_styling("striped", full_width = FALSE)brm_rounds <-run_model(expr =quote(brm( mean_reward ~ round * group + (1| id),data = df_mean_reward_subject_by_round,family =gaussian(),iter =4000,warmup =1000,chains =4,cores =4,seed =0511,backend ="cmdstanr",save_pars =save_pars(all =TRUE) )),modelName ='brm_reward_rounds')# Extract fitted values and add to data dffitted_values <-fitted(brm_rounds, re_formula =NA)df_mean_reward_subject_by_round$fitted_mean_reward <- fitted_values[, "Estimate"]p <-ggplot(df_mean_reward_subject_by_round, aes(x = round, y = mean_reward, group = group, shape = group, color = group)) +geom_point(data = df_summary_by_round, aes(x = round, y = mean_of_means, shape = group), size =3) +geom_line(aes(y = fitted_mean_reward), linewidth =1) +geom_jitter(aes(x = round, y = mean_reward), size =1, alpha =0.3, width =0.2) +scale_y_continuous("Mean Reward", breaks =c(25,30,35)) +xlab("Round") +scale_fill_manual(values = groupcolors) +scale_color_manual(values = groupcolors) +ggtitle("Mean Reward by Rounds and Group (brms)") +theme_classic() +theme(legend.title =element_blank())# tbl_regression(brm_rounds, exponentiate = F) #tab_model(brm_rounds)# Reduced model for computing BF: no round termbrm_rounds_reduced <-run_model(brm( mean_reward ~ group + (1| id),data = df_mean_reward_subject_by_round,family =gaussian(),iter =4000, warmup =1000, chains =4, cores =4, seed =0511,backend ="cmdstanr",save_pars =save_pars(all =TRUE)),modelName='brm_reward_rounds_reduced')# Compute Bayes Factor: Full vs. Reduced (without round term)#bf_brm_rounds <- bayes_factor(brm_rounds, brm_rounds_reduced)# format_parameters(brm_rounds)# # params <- model_parameters(brm_rounds) |> print_md()# params$Parameter <- gsub("groupPDP", "PD+", params$Parameter)# params$Parameter <- gsub("groupPDP", "PD+", params$Parameter)# params$Term <- gsub("round", "Round", params$Term)# # # params$Term <- gsub("groupPDP", "PD", params$Term)# plot_model(brm_rounds, type = "est") +# theme_classic()
Figure 1 shows the obtained rewards by round, for each group. An ANOVA with round as within- and group as between-subjects factor showed a difference between groups, with Control patients achieving the greatest rewards, followed by PD+ and PD- patients, but no change across rounds (and no interaction). A Bayesian regression analysis yielded comparable results, with the estimated effect of round on mean reward being very small (Estimate = 0, 95% CI [-0.01, 0.01]) and the credible interval including zero. In the subsequent analyses, we therefore aggregate across rounds.
Code
# Plot the mean reward by round for each group with dodged points and error barsggplot(df_summary_by_round, aes(x = round, y = mean_of_means, group = group, shape = group, color = group, fill = group)) +geom_line(position =position_dodge(width =0.3)) +# geom_errorbar(aes(ymin = mean_of_means - 1.96 * se_reward, ymax = mean_of_means + 1.96 * se_reward), width = 0.2, position = dodge, color = "black") + # geom_errorbar(aes(ymin = mean_of_means - se_reward, ymax = mean_of_means + se_reward), width =0.2, position =position_dodge(width =0.3), alpha=0.7) +#geom_point(position =position_dodge(width =0.3), size =3, stroke =1, alpha=.9) +#coord_cartesian(ylim = c(20,40)) +coord_cartesian(ylim =c(0.4,0.75)) +#scale_shape_manual(values = c(21, 24, 22)) + # circle, triangle, and squarescale_fill_manual(values = groupcolors) +scale_color_manual(values = groupcolors) +#scale_color_manual(values = c("black","black","black")) + # scale_y_continuous("Mean reward ± 95% CI") +scale_y_continuous("Mean reward ± SE") +scale_x_continuous("Round", breaks =2:9) +theme_classic() +theme(legend.title =element_blank(),plot.title =element_text(size=16),axis.text =element_text(size=14),axis.title =element_text(size=14))ggsave("plots/performance_rounds.png", width =6, height =3)
Figure 1: Performance over rounds (excluding tutorial and bonus round).
4.2 Performance: Rewards by group
Code
df_mean_reward_subject <- dat %>%filter(trial !=0& round %in%2:9) %>%# exclude first (randomly revealed) tile and practice round and bonus roundgroup_by(id) %>%summarise(age =mean(age),group =first(group),sum_reward =sum(z),mean_reward =mean(z), sd_reward =sd(z),BDI =first(BDI), MMSE =first(MMSE), hoehn_yahr =first(hoehn_yahr))# some summary stats for obtained mean rewards# df_mean_reward_subject %>%# group_by(group) %>%# summarise(n = n(),# m_reward = mean(mean_reward),# md_reward = median(mean_reward),# var_reward = var(mean_reward),# sd_reward = sd(mean_reward),# se_reward = sd_reward / sqrt(n),# lower_ci_reward = m_reward - qt(1 - (0.05 / 2), n - 1) * se_reward,# upper_ci_reward = m_reward + qt(1 - (0.05 / 2), n - 1) * se_reward) %>%# # kable(., format = "html", escape = FALSE, digits = 2) %>%# kable_styling("striped", full_width = FALSE)
Figure 2 shows the overall performance of each group, based on each subject’s mean reward across all trials. Control participants achieved higher rewards than both PD patients on medication (\(t(65)=2.5\), \(p=.014\), \(d=0.6\), \(BF=3.5\)) and off medication (\(t(63)=7.2\), \(p<.001\), \(d=1.8\), \(BF>100\)). Notably, PD patients on medication achieved substantially higher rewards than patients off medication (\(t(62)=5.9\), \(p<.001\), \(d=1.5\), \(BF>100\)), indicating as strong beneficial effect of L-Dopa on the ability to balance exploration and exploitation.
PD+ vs. PD-: \(t(62)=5.9\), \(p<.001\), \(d=1.5\), \(BF>100\)
Control vs. PD-: \(t(63)=7.2\), \(p<.001\), \(d=1.8\), \(BF>100\)
Control vs. PD+:\(t(65)=2.5\), \(p=.014\), \(d=0.6\), \(BF=3.5\)
Code
# Boxplots of rewards by groupp_performance <-ggplot(df_mean_reward_subject, aes(x = group, y = mean_reward, color = group, fill = group, shape = group)) +geom_hline(yintercept=.5, linetype='dashed', color='black') +# random choice model (mean across all 40 environments)geom_boxplot(alpha =0.2, outlier.shape =NA, width =0.4) +geom_jitter(width =0.15, size =2, alpha =0.8) +stat_summary(fun = mean, geom ="point", shape =23, fill ="white", size =4) +scale_color_manual(values = groupcolors) +scale_fill_manual(values = groupcolors) +ylab("Mean normalized reward") +xlab("") +ggtitle("Performance") +theme_classic() +theme(legend.position ='none',legend.title =element_blank(),plot.title =element_text(size=24),axis.text =element_text(size=18),axis.title =element_text(size=18) )p_performanceggsave("plots/performance.png", p_performance, dpi=300, height =5, width =6 )# Control vs. PD+# ttestBF(subset(df_mean_reward_subject, group == 'Control')$mean_reward, subset(df_mean_reward_subject, group == 'PD+')$mean_reward, var.equal = TRUE)# t.test(subset(df_mean_reward_subject, group == 'Control')$mean_reward, subset(df_mean_reward_subject, group == 'PD+' )$mean_reward, var.equal = T)# Control vs. PD-# ttestBF(subset(df_mean_reward_subject, group == 'Control')$mean_reward, subset(df_mean_reward_subject, group == 'PD-')$mean_reward, var.equal = TRUE)# t.test(subset(df_mean_reward_subject, group == 'Control')$mean_reward, subset(df_mean_reward_subject, group == 'PD-' )$mean_reward, var.equal = T)# PPD vs. PD-# ttestBF(subset(df_mean_reward_subject, group == 'PD+')$mean_reward, subset(df_mean_reward_subject, group == 'PD-')$mean_reward, var.equal = TRUE)# t.test(subset(df_mean_reward_subject, group == 'PD+')$mean_reward, subset(df_mean_reward_subject, group == 'PD-' )$mean_reward, var.equal = T)# Plot for Computational Psychiatry Conference (Tübingen, July 2025)# p_performance_by_group_CPP <- ggplot(df_mean_reward_subject, aes(x = group, y = mean_reward, color = group, fill = group, shape = group)) +# #geom_hline(yintercept=.5, linetype='dashed', color='red') + # random choice model (mean across all 40 environments)# geom_boxplot(alpha = 0.2, outlier.shape = NA, width = 0.5) + # geom_jitter(width = 0.15, size = 2, alpha = 0.8) + # stat_summary(fun = mean, geom = "point", shape = 23, fill = "white", size = 2) + # scale_color_manual(values = groupcolors) +# scale_fill_manual(values = groupcolors) +# ylab("Mean normalized reward") +# xlab("") +# ggtitle("Performance") +# theme_classic() +# theme(# strip.background = element_blank(),# strip.text = element_text(color = "black", size = 20),# legend.position = 'none',# legend.title = element_blank(),# plot.title = element_text(size=24),# axis.text = element_text(size=20),# axis.title = element_text(size=20)# )# # ggsave("plots/performance_by_group_CPP.png", p_performance_by_group_CPP, dpi=300, height = 5, width = 6 )# ggsave("plots/performance_by_group_CPP.pdf", p_performance_by_group_CPP, height = 5, width = 6 )
Figure 2: Obtained rewards by group. Each dot is one participants’ mean reward across all rounds and trials.
4.3 Performance: Learning curves
Participants’ learning curves (Figure 3) show the average reward obtained in each trial across rounds. For both polyneuropathy patients (Control) and PD patients on medication (PD+), the mean rewards increased as the round progresses, suggesting they effectively balanced exploration and exploitation to maximize rewards. In stark contrast, PD patients off medication (PD-) showed no improvement across trials.
TO DO: Add random reward as baseline
Figure 3: Learning curves, showing obtained mean reward for each trial, aggregated across rounds.
4.3.1 Performance: Role of physiological and cognitive assessments (BDI, MMSE, Hoehn-Yahr)
We also assessed patients in terms of their depressive symptoms (via BDI-II), cognitive functioning (via Mini-Mental-Status Examination, MMSE), and severity of motor symptoms (via Hoehn-Yahr scale, Parkinson’s disease patients only). We ran a hierarchical regression with reward as dependent variable and group, BDI score, and MMSE score; with random intercepts for participants to account for individual differences. This analysis yielded only an effect of group, suggesting that BDI and MMSE score were not related to performance.
TO DO: Keep eye on MMSE scores, approaching significance
Code
# Hierarchical frequentist regression with random intercept: Reward as function of BDI and MMSE score (all patients) lmer_performance_BDI_MMSE <-lmer(z ~ group + BDI + MMSE + (1| id), data =subset(dat, trial >0& round %in%2:9))#summary(lmer_reward_BDI_MMSE)tab_model(lmer_performance_BDI_MMSE, title ="Hierarchical regression results: Performance as function of BDI and MMSE score.", bpe="mean")
Hierarchical regression results: Performance as function of BDI and MMSE score.
z
Predictors
Estimates
CI
p
(Intercept)
0.27
-0.20 – 0.75
0.264
group [PD+]
-0.05
-0.08 – -0.01
0.005
group [PD-]
-0.12
-0.15 – -0.09
<0.001
BDI
0.00
-0.00 – 0.00
0.869
MMSE
0.01
-0.00 – 0.03
0.100
Random Effects
σ2
0.06
τ00id
0.00
ICC
0.06
N id
97
Observations
19400
Marginal R2 / Conditional R2
0.037 / 0.094
Code
# Hierarchical Bayesian regression with random intercept: Reward as function of BDI and MMSE score (all patients) brm_performance_BDI_MMSE <-run_model(brm(z ~ group + BDI + MMSE + (1|id),data=subset(dat, trial >0& round %in%2:9 ),chains =4, cores =4, save_pars =save_pars(all =TRUE),seed =0815,iter =5000,warmup=1000,backend ="cmdstanr",control =list(adapt_delta =0.99, max_treedepth =15)),#prior = prior(normal(0,10), class = "b")),modelName ='brm_performance_BDI_MMSE')#tab_model(brm_performance_assessment, bpe="mean", title = "Hierarchical Bayesian regression: Performance as function of BDI and MMSE score.") #bayes_R2(brm_performance_assessment) #tab_model(lmer_performance_BDI_MMSE, brm_performance_BDI_MMSE, title = "Hierarchical regression results: Performance as function of BDI and MMSE score.", bpe="mean")
Next, we ran a hierarchical regression for Parkinson’s patients only, with reward as dependent variable and group, BDI, MMSE, and Hoehn-Yahr score as predictors; with random intercepts for participants to account for individual differences. This analysis only yielded an influence of group, i.e. being on or off L-Dopa.
Code
# Hierarchical frequentist regression with random intercept: Reward as function of BDI, MMSE, and Hoehner-Yahr score (Parkinson's patients only) lmer_reward_PD_only_BDI_MMSE_HY <-lmer(z ~ group + BDI + MMSE + hoehn_yahr + (1| id), data =subset(dat, trial >0& round %in%2:9& group !="Control"))#summary(lmer_reward_PD_only_BDI_MMSE_HY)tab_model(lmer_reward_PD_only_BDI_MMSE_HY, title ="Hierarchical regression results: Performance of patients with Parkinson's disease as function of BDI, MMSE, and Hoehn-Yahr score.", bpe="mean")
Hierarchical regression results: Performance of patients with Parkinson's disease as function of BDI, MMSE, and Hoehn-Yahr score.
z
Predictors
Estimates
CI
p
(Intercept)
0.45
-0.07 – 0.98
0.093
group [PD-]
-0.08
-0.10 – -0.05
<0.001
BDI
0.00
-0.00 – 0.01
0.157
MMSE
0.00
-0.01 – 0.02
0.607
hoehn yahr
0.01
-0.01 – 0.03
0.456
Random Effects
σ2
0.06
τ00id
0.00
ICC
0.04
N id
64
Observations
12800
Marginal R2 / Conditional R2
0.025 / 0.063
Code
# Hierarchical Bayesian regression with random intercept: Reward as function of BDI, MMSE, and Hoehner-Yahr score (Parkinson's patients only) brm_performance_PD_only_BDI_MMSE_HY <-run_model(brm(z ~ group + BDI + MMSE + (1|id),data=subset(dat, trial >0& round %in%2:9& group !="Control"),cores=4,chains =4,seed =0815,iter =5000,warmup=1000,backend ="cmdstanr",control =list(adapt_delta =0.99)),#prior = prior(normal(0,10), class = "b")),modelName ='brm_performance_PD_only_BDI_MMSE_HY')# tab_model(lmer_reward_PD_only_BDI_MMSE_HY, brm_performance_PD_only_BDI_MMSE_HYtitle = "Hierarchical regression results: Performance of patients with Parkinson's disease as function of BDI, MMSE, and Hoehn-Yahr score.", bpe="mean")
4.4 Exploration vs. exploitation choices
To investigate the temporal dynamics of exploration and exploitation, we determined for each trial whether the chosen tile was novel (an exploration decision) or had already been selected previously (an exploitation decision). Intuitively, at the beginning of each round learners should predominantly engage in exploration to identify high-reward options, and gradually shift toward exploitative behavior as they approach the end of the round.
Code
# proportion of unique choices per round per subjectdf_unique_choices_round <- dat %>%filter(round %in%2:9& trial >0) %>%group_by(id,group, round) %>%summarize(total =n(), # number of trialsunique_tiles =n_distinct(x, y), # unique (x, y) combinations (i.e., tiles)repeat_tiles = total - unique_tiles ) %>%mutate(prop_unique = unique_tiles/total,prop_repeat = repeat_tiles/total) # proportion of unique choices across 8 rounds per subjectdf_unique_choices_subject <- df_unique_choices_round %>%group_by(id, group) %>%summarize(m_prop_unique =mean(prop_unique),m_prop_repeat =mean(prop_repeat))dat <- dat %>%group_by(id, round) %>%arrange(trial, .by_group =TRUE) %>%# Ensure data is sorted by trialmutate(is_new =factor(if_else(!duplicated(chosen), "new", "repeat")), # Check uniqueness based on 'chosen' columnis_new_label =factor(if_else(!duplicated(chosen), "Exploration", "Exploitation"), levels =c("Exploration", "Exploitation")) ) %>%ungroup()dat_repeat_prop <- dat %>%filter(trial >0& round %in%2:9) %>%group_by(id, group, trial) %>%summarize(prop_repeat =mean(is_new =="repeat", na.rm =TRUE) # Calculate proportion of "repeat" (exploitation) choices# prop_new = mean(is_new == "new", na.rm = TRUE) # Calculate proportion of "new" (exporation) choices )
Figure 4 shows that both Control and PD+ patients increased the amount of exploitation over time, indicating a goal-directed shift from exploring novel options to exploiting known high-value options. The Control group began exploiting earlier in the round and exhibited a stronger overall tendency toward exploitation compared to the PD+ group, indicating that this earlier focus on exploitation underlies their better performance. In stark contrast, PD- patients predominantly engaged in exploration and showed only a weak tendency towards exploitation as the search horizon approached its end. This pattern is also reflected in the overall proportion of exploitation decisions (Figure 4, inset). Control patients made more exploitation decisions than PD+ patients (\(t(65)=2.6\), \(p=.011\), \(d=0.6\), \(BF=4.2\)), who exploited more than PD- patients (\(t(62)=5.0\), \(p<.001\), \(d=1.3\), \(BF>100\)). Notably, PD- patients almost exclusively selected novels options during the task and only rarely exploited known options. These distinct behavioral patterns show how suboptimal balance of exploration and exploitation affects obtained rewards.
Figure 4: Balancing exploration and exploitation. Shown are the mean proportions of exploitation decisions, aggregated over trials and rounds. Each dot is one participant.
The mean proportion of repeat choices (=exploitation) differed among all groups, with patients with polyneuropathy (Control) showing higher levels of exploitation than both PD+ and PD- patients. Parkinson patients on medication (PD+) exploited more than patients off medication (PD-).
Control vs. PD+: \(t(65)=2.6\), \(p=.011\), \(d=0.6\), \(BF=4.2\)
Control vs. PD-: \(t(63)=6.5\), \(p<.001\), \(d=1.6\), \(BF>100\)
PD+ vs. PD-: \(t(62)=5.0\), \(p<.001\), \(d=1.3\), \(BF>100\)
4.5.1 Rewards obtained from exploration and exploitation
We also calculated the mean reward obtained from learners’ explore and exploit choices, respectively. Figure 6 shows that both Control and PD+ patients obtained higher rewards from their exploration choices, indicating higher levels of adaptation to the structure of the environment. Similarly, for exploitative choices, PD+ and Control patients obtained higher rewards, showing that PD- patients not only exploited less frequently but also did so less efficiently.
Figure 6: Mean rewards for explore versus exploit choices, averaged across all rounds and trials per participant. Each dot represents one participant.
Explore choices
Control vs. PD+: \(t(65)=1.0\), \(p=.311\), \(d=0.2\), \(BF=.39\)
Control vs. PD-: \(t(63)=3.4\), \(p=.001\), \(d=0.8\), \(BF=24\)
PD+ vs. PD-: \(t(62)=2.9\), \(p=.005\), \(d=0.7\), \(BF=7.6\)
Exploit choices
Control vs. PD+: \(t(60)=0.3\), \(p=.738\), \(d=0.1\), \(BF=.27\)
Control vs. PD-: \(t(49)=3.4\), \(p=.001\), \(d=1.0\), \(BF=24\)
PD+ vs. PD-: \(t(47)=3.2\), \(p=.002\), \(d=0.9\), \(BF=16\)
4.6 Spatial trajectories
4.6.1 Distance consecutive choices
We next consider participant’s spatial search trajectories (distance among consecutive clicks). Distance is measured as Manhattan distance between consecutive clicks, such that repeat clicks have distance 0, clicking directly neighbouring tiles has distance 1, and clicks further away have distances >1.
The most frequent choice was to select a neighboring tile (distance = 1), reflecting a local search approach (Wu et al., 2018). On average, Control patients had the shortest distances, indicating more local searches and repeated clicks. PD+ patients had greater distances than Control but shorter than PD- patients, who showed the highest distances. The distribution of distances shows that this is primarily due to the few repeat choices (distance = 0) they made, i.e. very limited exploitation behavior.
Control patients had lower search distances than the PD+ group (\(t(65)=-2.5\), \(p=.015\), \(d=0.6\), \(BF=3.4\)) and lower distances than the PD- group (\(t(63)=-2.3\), \(p=.024\), \(d=0.6\), \(BF=2.3\)). There was no difference between Parkinson patient with (PD+) and without (PD-) medication (\(t(62)=-0.4\), \(p=.661\), \(d=0.1\), \(BF=.28\)).
4.6.2 Types of choices
We can also categorize each consecutive click as “repeat” (clicking the same tile as in the previous round), “near” (clicking a directly neighboring tile, i.e. distance=1), or “far” (clicking a tile with distance > 1). We first computed for each participant the proportion of type of choices across all 8 rounds x 25 clicks = 200 search decisions and then plot the mean proportion for each group.
The analyses reveal distinct search patterns across patient groups. Control participants had the highest proportion of repeat (exploit) decisions, followed by the PD+ group. The proportion of repeat decisions in the PD- group was minimal. These behaviors help explain the differences in learning curves, where Control patients showed the most significant improvement, followed by PD+ patients. In contrast, PD- patients exhibited no improvement across trials, due to their lower tendency to exploit high-reward options.
Figure 7: Types of search decisions in terms of distance.
An analysis of consecutive choice types over time reveals clear differences in search behavior between the groups. Both Control and PD+ patients adapt their strategies as the round progresses by decreasing the number of local (distance = 1) and far (distance > 1) choices, while increasing the number of exploit decisions, indicating a shift from exploration to exploitation. Notably, the data indicate a faster shift to exploitation for Control patients compared to PD+ patients, with an earlier and stronger preference for re-selecting known high-reward options. In contrast, PD- patients show limited adaptation, with the proportions of each decision type remaining relatively stable throughout the round, aside from a slight increase in exploit decisions.
Finally, we analysed the relation between the value of a reward obtained at time \(t\) and the search distance on the subsequent trial \(t+1\). If a large reward was obtained, searchers should search more locally, while conversely, if a low reward was obtained, searchers should be more likely to search farther away.
Across all trials and rounds, search distance and previous reward were negatively correlated, indicating that participants tended to search further away following lower rewards compared to higher rewards. This relationship was stronger in Control patients (\(r=-.43\), \(p<.001\), \(BF>100\)) and PD+ patients (\(r=-.35\), \(p<.001\), \(BF>100\)) compared to PD- patients (\(r=-.17\), \(p<.001\), \(BF>100\)). These findings suggest that PD patients off medication exhibited less adaptive search behavior than those on medication and individuals with polyneuropathies.
Code
# correlation of previous reward and distance of consecutive choices, by age group and environment# overall, ignoring within-subject factor# dat %>% # filter(trial != 0 & round %in% 2:9) %>% # exclude first (randomly revealed) tile and practice round and bonus round# group_by(group) %>% # summarise(corTestPretty(previous_reward, distance))# mean correlation between distance and reward obtained on previous step# first aggregated within each round and then within each subject# such that there is one correlation for each subject# reward_distance_cor <- dat %>% # filter(trial != 0 & round %in% 2:9) %>% # exclude first (randomly revealed) tile and practice round and bonus round# group_by(id, round, group) %>% # summarise(cor = cor(previous_reward, distance)) %>% # mutate(cor = replace_na(cor, 0)) %>% # in some rounds subjects clicked the same tile throughout; set cor=0# ungroup() %>% # group_by(id, group) %>% # summarise(mean_cor = mean(cor))# mean correlation between distance and reward obtained on previous step as function of group# reward_distance_cor %>% # group_by(group) %>% # summarise(n = n(),# m_cor = mean(mean_cor),# SD_cor = sd(mean_cor),# se_cor = SD_cor / sqrt(n),# lower_ci_cor = m_cor - qt(1 - (0.05 / 2), n - 1) * se_cor,# upper_ci_cor = m_cor + qt(1 - (0.05 / 2), n - 1) * se_cor)#plot regression lines based on raw data# ggplot(subset(dat, trial > 0 & round %in% 2:9), aes(x = previous_reward, y = distance, color = group)) +# facet_wrap(~group) + # geom_jitter(alpha = 0.3, width = 0.1, height = 0.1) + # geom_smooth(method = "lm", formula = y ~ x, se = TRUE) + # # ggtitle("Regression Lines for Distance by Previous Reward and Group") +# theme_minimal() +# xlab("Previous Reward") +# ylab("Distance")
Given the nested structure of the data, we next employed a Bayesian hierarchical regression analysis to predict search distance based on the reward obtained in the previous step, with group and their interaction as population-level (fixed) effects and subject-wise random intercepts. These analyses show that both the magnitude of reward obtained on the last step and group influence search distance. Notably, PD patients off medication (PD-) adapted their search behavior less in response to reward magnitude, while patients on medication (PD+) exhibited adaptation levels close to to the Control group.
Code
# for now, random intercepts only, Random intercept + random slope not stable# lmer_distance_reward <- lmer(distance ~ previous_reward * group + (previous_reward + group | id), # data = subset(dat, trial > 0 & round %in% 2:9))# fit modellmer_distance_reward <-lmer(distance ~ previous_reward * group + (1| id), data =subset(dat, trial >0& round %in%2:9))#summary(lmer_distance_reward)#emmeans(lmer_distance_reward, pairwise ~ previous_reward | group, pbkrtest.limit = 15000)p_lmer_distance_reward <-plot_model(lmer_distance_reward, type ="pred", terms =c("previous_reward", "group")) +stat_summary(dat, mapping=aes(x=previous_reward, y=distance, color=group, fill=group,shape = group), fun=mean, geom='point', alpha=0.7, size=1, na.rm =TRUE)+# scale_x_continuous('Previous Reward', breaks = (c(0,10,20,30,40,50))) +# scale_x_continuous('Previous Reward', breaks = (c(0,0.210,20,30,40,50))) +ylab('Distance to Next Option')+scale_fill_manual(values=groupcolors) +scale_color_manual(values=groupcolors) +ggtitle('Search Distance ~ Previous Reward (lmer)') +theme_classic() +theme(legend.position ="inside", legend.position.inside =c(0.85, 0.9), # Use legend.position.insidelegend.justification =c(1, 1),legend.title =element_blank(),legend.box.background =element_blank(),legend.key =element_rect(fill ="white"),axis.text =element_text(colour ="black", size =14),axis.title =element_text(colour ="black", size =14),) +guides(color =guide_legend(override.aes =list(fill =NA, size =2 )))# p_lmer_distance_reward$layers[[2]]$show.legend <- FALSE# p_lmer_distance_rewardggsave("plots/regression_distance_reward_lmer.png", p_lmer_distance_reward, dpi=300, height=5, width=7)
Code
# Bayesian regression analysis# run_model() is a wrapper for brm models such that it saves the full model the first time it is run, otherwise it loads it from disk from directory `~brm`# Fixed effects: previous_reward and group.# Random effects: random slopes and a random intercept for both previous_reward and group by id, i.e., the effect of previous_reward and group can vary across individuals (id).# random intercept and random slope# brm_distance_reward <- run_model(brm(distance ~ previous_reward * group + (previous_reward + group | id), # random intercept brm_distance_reward <-run_model(brm(distance ~ previous_reward * group + (1|id),data=subset(dat, trial >0& round %in%2:9 ),cores=4,chains=4,backend ="cmdstanr",seed =0815,iter =5000,warmup=1000,control =list(adapt_delta =0.99, max_treedepth =15)),#prior = prior(normal(0,10), class = "b")),modelName ='brm_distance_reward')#tab_model(brm_distance_reward, bpe="mean", title = "Bayesian regression results: Search distance as function of reward on previous step.") #bayes_R2(brm_distance_reward) # generate plot manually predictions (otherwise difficult to plot the mean empirical values per geom_point)# prevReward <- seq(-3,55) / 50 # normalized rewardprevReward <-seq(round(min(dat$previous_reward, na.rm=T),1),round(max(dat$previous_reward, na.rm=T),1), 0.1) # normalized rewardgroup <-levels(dat$group)newdat <-expand.grid(previous_reward=prevReward, group=group)# predict distance based on previous rewardpreds <-fitted(brm_distance_reward, re_formula=NA, newdata=newdat, probs=c(.025, .975))predsDF <-data.frame(previous_reward=rep(prevReward, 3),group=rep(levels(dat$group), each=length(prevReward)),distance=preds[,1],lower=preds[,3],upper=preds[,4])# average distancegrid <-expand.grid(x1=0:7, x2=0:7, y1=0:7, y2=0:7)grid$distance <-NAfor(i in1:dim(grid)[1]){ grid$distance[i] <-dist(rbind(c(grid$x1[i], grid$x2[i]), c(grid$y1[i], grid$y2[i])), method ="manhattan")}meanDist <-mean(grid$distance)# plot predictionsp_regression_distance_reward <-ggplot() +stat_summary(dat, mapping=aes(x=previous_reward, y=distance, color=group, fill=group), fun=mean, geom='point', alpha=0.7, size=1, na.rm=T)+geom_line(predsDF, mapping=aes(x=previous_reward, y=distance, color=group), linewidth=1) +geom_ribbon(predsDF, mapping=aes(x=previous_reward, y=distance, ymin=lower, ymax=upper, fill=group), alpha=.3) +#geom_hline(yintercept=meanDist, linetype='dashed', color='red') + # mean distance# xlab('Normalized Previous Reward')+coord_cartesian(ylim=c(0,5), xlim=c(-0.01,1)) +scale_x_continuous(name ="Previous normalized reward",breaks =seq(0, 1, 0.1),labels =sprintf("%.1f", seq(0, 1, 0.1))) +ylab('Distance to next chosen option')+scale_fill_manual(values=groupcolors) +scale_color_manual(values=groupcolors) +labs(title ="Search Distance ~ Previous Reward",#subtitle = "(Bayesian hierarchical regression)" ) +theme_classic() +theme(legend.position ="inside", legend.position.inside =c(0.85, 0.9), legend.justification =c(1, 1),legend.title =element_blank(),legend.text =element_text(colour ="black", size =18),plot.title =element_text(colour ="black", size =24),plot.subtitle =element_text(colour ="black", size =18),axis.text =element_text(colour ="black", size =18),axis.title =element_text(colour ="black", size =18) ) p_regression_distance_rewardggsave("plots/regression_distance_reward_brms.png", p_regression_distance_reward, dpi=300, height=5, width=7)ggsave("plots/regression_distance_reward_brms.pdf", p_regression_distance_reward, height=5, width=7)# mean values vy previosu reward and group# df_summary <- dat %>%# group_by(previous_reward, group) %>%# summarise(# mean_distance = mean(distance, na.rm = TRUE),# .groups = "drop"# )# Plot for Computational Psychiatry Conference (Tübingen, July 2025)# ggplot() +# stat_summary(dat, mapping=aes(x=previous_reward, y=distance, color=group, fill=group), fun=mean, geom='point', alpha=0.7, size=1, na.rm=T)+# geom_line(predsDF, mapping=aes(x=previous_reward, y=distance, color=group), linewidth=1) +# geom_ribbon(predsDF, mapping=aes(x=previous_reward, y=distance, ymin=lower, ymax=upper, fill=group), alpha=.3) +# #geom_hline(yintercept=meanDist, linetype='dashed', color='red') + # mean distance# scale_x_continuous('Previous Reward', breaks = seq(0,50,10), labels = c(0,10,20,30,40,50)) +# scale_y_continuous('Distance to next choice', breaks = seq(0,7,1), labels = c(0,1,2,3,4,5,6,7))+ # ylab('Distance to next choice') +# coord_cartesian(ylim= c(0,5)) +# ylab('Distance to next choice')+# scale_fill_manual(values=groupcolors) +# scale_color_manual(values=groupcolors) +# labs(# title = "Search Distance ~ Previous Reward",# subtitle = "(Bayesian hierarchical regression)") +# theme_classic() +# theme(legend.position = "inside", # legend.position.inside = c(0.85, 0.9), # legend.justification = c(1, 1),# legend.title = element_blank(),# legend.text = element_text(colour = "black", size = 18),# plot.title = element_text(colour = "black", size = 22),# plot.subtitle = element_text(colour = "black", size = 18),# axis.text = element_text(colour = "black", size = 18),# axis.title = element_text(colour = "black", size = 18)# ) # # # ggsave("plots/regression_distance_reward_brms_CPP.png", dpi=300, height=5, width=6)# ggsave("plots/regression_distance_reward_brms_CPP.pdf", height=5, width=6)
Figure 8: Bayesian hierarchical regression analysis with search distance as function of previous reward. Dots are the empirical mean distances for each reward value, aggregated over participants, trials, and rounds.
4.7 Bonus round judgments
In the bonus round, participants made 15 search decisions and then predicted the rewards for 5 randomly chosen, previously unobserved tiles. Subsequently, they chose one of the five tiles and continued the round until the search horizon of 25 clicks was met.
Data frame dat_bonus contains the following variables:
id: participant id
bonus_env_number: internal id of the bonus round environment
bonus_environment: recodes condition as Smooth (high spatial correlation)
x and y are the coordinates of the random tiles on the grid for whcih participants were asked to provide reward estimates
howSecure: participant confidence for given reward judgment (scale 0-10)
chosen_x and chosen_y are the coordinates of the tile chose after making reward and confidence judgments for 5 random tiles
true_z is the ground truth, i.e. true expected reward of a tile
error is the absolute deviation between participants reward estimates (givenValue) and ground truth (true_z)
chosen is whether the option was chosen or not (participants chose one of the five options after estimating their value and confidence in their reward prediction)
Note
Charley: is this scaling still correct (taken from YKWG code)? bonus_environment$z <- bonus_environment$z * scale_factor + 5
Table 3: Bonus round data.
id
bonus_env_number
bonus_environment
x
y
givenValue
howSecure
chosen_x
chosen_y
true_z
chosen
error
group
111
38
Rough
5
6
20
5
7
3
16.34
not chosen
3.66
Control
111
38
Rough
2
7
26
4
7
3
16.16
not chosen
9.84
Control
111
38
Rough
7
3
16
5
7
3
38.27
chosen
22.27
Control
111
38
Rough
0
7
28
3
7
3
23.99
not chosen
4.01
Control
111
38
Rough
7
6
30
5
7
3
34.10
not chosen
4.10
Control
115
39
Rough
0
0
19
4
3
1
25.86
not chosen
6.86
Control
4.7.1 Prediction error
Figure 9 shows the mean absolute error between participants’ estimates and the true underlying expected reward, for each age group and environment. Compared to a random baseline, all groups performed better than chance level:
Control vs. PD+: \(t(60)=-1.2\), \(p=.221\), \(d=0.3\), \(BF=.49\)
Control vs. PD-: \(t(57)=-2.6\), \(p=.012\), \(d=0.7\), \(BF=4.1\)
PD+ vs. PD-: \(t(53)=-1.2\), \(p=.249\), \(d=0.3\), \(BF=.48\)
Figure 9: Prediction error of bonus round judgments. The red dotted line indicates a random baseline.
4.7.2 Prediction error and confidence
Code
# Across all judgments and participants, there was no systematic relation between confidence and prediction error:# corTestPretty(dat_bonus$error, dat_bonus$howSecure, method = "kendall") # cor.test(dat_bonus$error, dat_bonus$howSecure, method = "kendall") # correlationBF(dat_bonus$error, dat_bonus$howSecure, method = "kendall")
A Bayesian regression with with prediction error as dependent variable, and confidence and group and their interaction as population-level (“fixed”) effects, and a random intercept for participants showed that for Control patients confidence and predictione error were negatively correlated (i.e., lower confidence was associated with a higher error), whereas for the two Parkinson groups there was no relation.
To analyze selected and not-selected options, we first averaged the predicted reward and confidence of the not-chosen tiles within subjects, and then compared chosen and not chosen options. Selected tiles tended to have higher predicted rewards. Participants were not more confident in selected options, and selected tiles did not have a higher true reward than not selected tiles.
5.3 Performance as function of BDI, MMSE, and Hoehn-Yahr
5.3.1 Performance as function of depression score (BDI-II)
The plots show performance as function of depression score (BDI-II), separately for each group. Opposing trends were found in the different groups: for patients with polyneuropathy (Control), there was a negative relation such that patients with higher depression scores obtained lower rewards. For the two Parkinson groups, the relation was positive, such that patients reporting more severe symptoms obtained higher rewards.
Figure 15: Performance as function of Mini-Mental State Examination (MMSE).
5.3.3 Performance as function of Hoehn-Yahr (Parkinson patients only)
The Hoehn-Yahr scale provides basic information about the severity of motor impairments in Parkinson’s disease, with higher scores indicating greater severity.
Frank et al. (2004) Cognitive reinforcement learning in parkinsonism
Demonstrated that Parkinson’s patients off medication learn poorly from positive feedback, implicating dopamine in reward learning and adaptive action selection.
Frank et al. (2004): Cognitive reinforcement learning in parkinsonism. Science, 306, 1940–1943
Daw et al. (2006) Cortical substrates for exploratory decisions in humans
Showed that exploratory decisions activate frontopolar cortex and are modulated by uncertainty, supporting a neural mechanism for explore–exploit arbitration.
Daw et al. (2006): Cortical substrates for exploratory decisions in humans. Nature, 441, 876–879
Cohen et al. (2007) How the brain manages the exploration–exploitation tradeoff
Proposed a theoretical framework linking locus coeruleus–norepinephrine function to exploration, via regulation of cortical neural gain.
Cohen et al. (2007): How the brain manages the exploration–exploitation dilemma. Phil. Trans. R. Soc. B, 362, 933–942
Frank et al. (2009) Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation
Found that genetic variation in dopamine pathways predicts differences in both uncertainty-directed and random exploration.
Frank et al. (2009): Dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci, 12, 1062–1068
Kayser et al. (2015) Dopamine, locus of control, and the exploration–exploitation tradeoff
Showed that dopamine synthesis capacity interacts with locus of control to shape exploration strategies.
Kayser et al. (2015): Dopamine, locus of control, and the exploration–exploitation tradeoff. Neuropsychopharmacology, 40, 454–462
Addicott et al. (2017) A primer on the explore/exploit trade-off for psychiatry
Reviewed relevance of explore/exploit mechanisms to psychiatric disorders and their utility in clinical research.
Addicott et al. (2017): A primer on the explore/exploit trade-off for psychiatry. Neuropsychopharmacology, 42, 1931–1939
Gershman & Tzovaras (2018) Dopaminergic genes are associated with both directed and random exploration
Found that polymorphisms in dopaminergic genes relate selectively to directed exploration, not random variability.
Gershman & Tzovaras (2018): Dopaminergic genes and exploration. Neuropsychologia, 120, 97–104
Cinotti et al. (2019) Dopamine blockade impairs the exploration–exploitation trade-off in rats
Provided causal evidence that dopamine D2 receptor antagonism reduces exploratory choices in a multi-armed bandit task.
Cinotti et al. (2019): Dopamine blockade impairs the explore–exploit trade-off. Sci Rep, 9, 6770
Chakroun et al. (2020) Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making
Showed that L-DOPA boosts uncertainty-directed exploration but not random exploration in humans.
Chakroun et al. (2020): Dopaminergic modulation of exploration. eLife, 9, e51260
Cremer et al. (2023) Disentangling the roles of dopamine and noradrenaline in the exploration–exploitation trade-off
Found that dopamine primarily modulates directed exploration, while noradrenaline affects choice variability.
Cremer et al. (2023): Disentangling roles of dopamine and noradrenaline in explore/exploit. Neuropsychopharmacology, 48, 1078–1086
Chen et al. (2024) Dopamine and norepinephrine mediate the explore/exploit trade-off in humans
Combined pupillometry, pharmacology, and fMRI to show distinct roles of dopamine and norepinephrine in exploration and exploitation.
Chen et al. (2024): Dopamine and norepinephrine mediate explore/exploit tradeoff. J Neurosci, 44
References
Abbott, A. (2010). Levodopa: The story so far. Nature, 466(7310), S6–S7.
Beck, A. T., Steer, R. A., Brown, G. K., et al. (1996). Beck depression inventory.
Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198.
Giron, A. P., Ciranka, S., Schulz, E., Bos, W. van den, Ruggeri, A., Meder, B., & Wu, C. M. (2023). Developmental changes in exploration resemble stochastic optimization. Nature Human Behaviour, 7(11), 1955–1967. https://doi.org/https://doi.org/10.1038/s41562-023-01662-1
Goetz, C. G., Poewe, W., Rascol, O., Sampaio, C., Stebbins, G. T., Counsell, C., Giladi, N., Holloway, R. G., Moore, C. G., Wenning, G. K., et al. (2004). Movement disorder society task force report on the hoehn and yahr staging scale: Status and recommendations the movement disorder society task force on rating scales for parkinson’s disease. Movement Disorders, 19(9), 1020–1028.
Hautzinger, M., Keller, F., & Kühner, C. (2006). Beck depressions-inventar (BDI-II). Harcourt Test Services.
Hoehn, M. M., & Yahr, M. D. (1967). Parkinsonism: Onset, progression, and mortality. Neurology, 17(5), 427–427.
Meder, B., Wu, C. M., Schulz, E., & Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science, 24(4), e13095. https://doi.org/https://doi.org/10.1111/desc.13095
Sadeghiyeh, H., Wang, S., Alberhasky, M. R., Kyllo, H. M., Shenhav, A., & Wilson, R. C. (2020). Temporal discounting correlates with directed exploration but not with random exploration. Scientific Reports, 10(1), 4020.
Schulz, E., Wu, C. M., Ruggeri, A., & Meder, B. (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 1561–1572. https://doi.org/10.1177/0956797619863663
Tambasco, N., Romoli, M., & Calabresi, P. (2018). Levodopa in parkinson’s disease: Current status and future developments. Current Neuropharmacology, 16(8), 1239–1252.
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behaviour, 2, 915–924. https://doi.org/10.1038/s41562-018-0467-4